7.2 Coding

79

distant heliograph operator had difficulty in receiving the flashes reliably from the

sender, and it might therefore have been decided to repeat each flash three times

and the recipient would use majority selection on each group of three to deduce the

message. The capacity of the channel would thereby be lowered threefold.

In many practical cases, the physical medium for transmitting messages has to be

shared by many different messages. It is a great advantage of optical communications

that streams of photons of different wavelengths do not interfere with one another.

Therefore, an optical fibre can carry many independent signals. Inside a cell, in

which the cytoplasm is a shared medium, many different molecules are present and

independence is determined by the differential chemical affinities between pairs of

molecules.

7.2

Coding

Coding refers to the transduction of a message into another form. It is ubiquitous

in our world. Ideas are encoded into words, music, pictures, one language may be

encoded into another, and so on. We have already made extensive use of binary

coding; the compact disc-based recording industry today uses binary coding almost

exclusively for music, pictures, and words. Evidently any number can be written in

base 2; hence, a possible drill (algorithm) for binary coding consists of the following

steps:

1. Assign a number to each state to be encoded;

2. Convert that number into base 2.

A DNA sequence can thereby be converted into binary form by making the assign-

ments A right arrow 11, C right arrow 22, T right arrow 33, and G right arrow 44, which in base 2 are 1, 10, 11, and 100,

respectively. The coded sequence would have to be written (001, 010, etc.) and read

in groups of three digits, otherwise “AA” could be misinterpreted as “T” and so forth.

Alternatively, separators can be introduced (see also the Huffman code described near

the beginning of Sect. 7.4). The reading frame is thus defined as the series of groups

of three beginning with the first. DNA is an example of a usually nonoverlapping

code of contiguous triplets.

Codes may be written as transformations, e.g.,

A

B

C

D

· · ·

Z

B

C

D

E

· · · A

,

which could also be written down compactly by the instruction “replace each letter

by the next one to the right” (sfqmbdf fbdi mfuufs cz uif ofyu pof up uif sjhiu). A

scheme for recoding DNA could be

A

C

T

G

1

2

3

4